1983) or grammatical roles (Suri & McCoy, 1994; 
Dahl & Ball, 1990)). Dahl & Ball (1990) improve 
the focusing mechanism by simplifying its data struc- 
tures and, thus, their proposal is more closely related 
to the centering model than any other focusing mecha- 
nism. But their approach still relies upon grammatical 
information for the ordering of the centering list, while 
we use only the functional information structure as the 
guiding principle. 

6 Conclusion 

In this paper, we provided an account for ordering 
the forward-looking centers which is entirely based on 
functional notions, grounded on the information struc- 
ture of utterances in a discourse. We motivated our 
proposal by the constraints which hold for a free word 
order language such as German and derived our results 
from data-intensive empirical studies of (real-world) 
expository texts. We have gathered preliminary evi- 
dence that the functional ordering of discourse enti- 
ties in the centers seems to coincide with the gram- 
matical roles of fixed word order languages. We also 
augmented the ordering criteria of the forward-looking 
center such that it accounts not only for (pro)nominal 
but also for functional anaphora (textual ellipsis), an 
issue that, so far, has only been sketchily dealt with 
in the centering framework. The extensions we pro- 
pose have been validated by the empirical analysis of 
real-world expository texts of considerable length. We 
thus follow methodological principles of corpus-based 
studies that have been successfully exercised in the 
work of Passonneau (1993). Still open are proper de- 
scriptions of deictic expressions, proper names (cf. the 
Alfa Romeo driving scenario), and plural or generic 
definite noun phrases. An anaphora resolution module 
and an ellipsis handler based on this functional center- 
ing model has been implemented as part of a compre- 
hensive text parser for German. 
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CONTINUE 


RETAIN 


SMOOTH-SHIFT 


ROUGH-SHIFT 


CONTINUE 
RETAIN 

SMOOTH-SHIFT 
ROUGH-SHIFT 


cheap 
cheap 

expensive 
cheap 

expensive 


expensive 
cheap 

expensive 
expensive 
expensive 


expensive 
cheap 

expensive 
cheap 


expensive 
expensive 
expensive 
expensive 



Table 9: Costs for Transition Pairs 





cost type 


naive 


naive & 
ante > express 


canonical 


canonical & 
ante > express 


functional 




cheap 


72 


180 


129 


236 


321 


IT 


expensive 


317 


209 


260 


153 


68 




cheap 


25 


36 


45 


51 


62 


Spiegel 


expensive 


50 


39 


30 


24 


13 




cheap 


45 


48 


46 


48 


55 


Miiller 


expensive 


34 


31 


33 


31 


24 




cheap 


142 


264 


220 


335 


438 


E 


expensive 


401 


279 


323 


208 


105 



Table 10: Cost Values for Centering Transition Pair Types 



verbs (Walker et al., 1994). However, the results our 
constraints generate are the same as those generated by 
Walker et al. including these model extensions. Only a 
single problematic case remains, viz. example (30) of 
Walker et al. (1994, p.214) causes the same problems 
they described (discourse-initial utterance, semantic 
or world knowledge should be available). Even for 
the crucial examples (32)-(36) of Walker et al. (1994, 
p. 216-221) our constraints generate the same C/s as 
Walker et al.' s constraints with ZTA. 

To summarize the results of our empirical evalua- 
tion, we first claim that our proposal based on func- 
tional criteria leads to substantially better and — with 
respect to the inference load placed on the text under- 
stander, whether human or machine — more plausi- 
ble results for languages with free word order than the 
structural constraints given by Grosz et al. (1995) and 
those underlying a naive approach. We base these ob- 
servations on an evaluation approach which considers 
transition pairs in terms of the inference load specific 
pairs imply. Second, we have gathered some evidence, 
still far from being conclusive, that the functional con- 
straints on centering seem to incorporate the struc- 
tural constraints for English and the modified struc- 
tural constraints for Japanese. Hence, we hypothesize 
that functional constraints on centering might consti- 
tute a general mechanism for treating free and fixed 
word order languages by the same descriptive mecha- 
nism. This claim, however, has to be further substan- 
tiated by additional cross-linguistic empirical studies. 

5 Comparison with Related Approaches 

The centering model (Grosz et al., 1983; 1995) is con- 
cerned with the interactions between the local coher- 
ence of discourse and the choices of referring expres- 
sions. Crucial for the centering model is the way 



the forward-looking centers are organized. Despite 
several cross-linguistic studies a kind of "standard" 
has emerged based on the study of English (cf. Ta- 
ble 1 in Section 1). Only few of these cross-linguistic 
studies have led to changes in the basic order of dis- 
course entities, the work of Walker et al. (1990; 
1994) being the most far reaching exception. They 
consider the role of expressive means in Japanese to 
indicate topic status and the speaker's perspective, 
thus introducing functional notions, viz. TOPIC and 
Empathy, into the discussion. German, the object 
language we deal with, is also a free word order lan- 
guage like Japanese (possibly even more constrained). 
Our basic revision of the ordering scheme completely 
abandons grammatical role information and replaces it 
with entirely functional notions reflecting the informa- 
tion structure of the utterances in the discourse. Inter- 
estingly enough, several extra assumptions introduced 
to account, e.g., for anaphora parallelism (e.g., the 
shared property constraint formulated by Kameyama 
(1986)) can be eliminated without affecting the cor- 
rectness of anaphora resolutions. Rambow (1993) has 
presented a theme/rheme distinction within the cen- 
tering model to which we fully subscribe. His pro- 
posal concerning the centering analysis of German (al- 
ready referred to as the "naive" approach; cf. Section 
4) is limited, however, to the German middlefield and, 
hence, incomplete. 

A common topic of criticism relating to focusing 
approaches to anaphora resolution has been the diver- 
sity of data structures they require, which are likely 
to hide the underlying linguistic regularities. Focus- 
ing algorithms prefer the discourse element already 
in focus for anaphora resolution, thus considering 
context-boundedness, too. But the items of the fo- 
cus lists are either ordered by thematic roles (Sidner, 





Transition Tvnps 


naive 


naive & 


panoninal 


canonical & 


functional 
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ante > express 




ante > express 






CONTINUE 


49 


167 


102 


197 


309 




RETAIN 


269 


158 


226 


131 


25 


IT 


SMOOTH-SHIFT 


32 


41 


24 


35 


51 




ROUGH-SHIFT 


39 


23 


37 


26 


4 




Errors 


69 


70 


68 


69 


67 




CONTINUE 


17 


28 


37 


43 


50 




RETAIN 


42 


32 


28 


23 


12 


Spiegel 


SMOOTH-SHIFT 


9 


9 


7 


8 


13 




ROUGH-SHIFT 


7 


6 


3 


1 







Errors 


18 


L 19 


l 16 


17 


16 




CONTINUE 


31 


31 


32 


32 


36 




RETAIN 


19 


19 


18 


18 


15 


Miiller 


SMOOTH-SHIFT 


15 


17 


15 


16 


18 




ROUGH-SHIFT 


14 


12 


14 


13 


10 




Errors 


22 


22 


22 


22 


22 




CONTINUE 


97 


226 


171 


272 


395 




RETAIN 


330 


209 


272 


172 


52 


E 


SMOOTH-SHIFT 


56 


67 


46 


59 


82 




ROUGH-SHIFT 


60 


41 


54 


40 


14 




Errors (specific errors) 


109(10) 


111 (12) 


106 (7) 


108 (9) 


105 (6) 



Table 8: Numbers of Centering Transitions 



line of argumentation, we here propose to classify all 
occurrences of centering transition pairs with respect 
to the costs they imply. The cost-based evaluation 
of different Cf orderings refers to evaluation criteria 
which form an intrinsic part of the centering model 6 . 

Transition pairs hold for two immediately succes- 
sive utterances. We distinguish between two types of 
transition pairs, cheap ones and expensive ones. We 
call a transition pair cheap if the backward-looking 
center of the current utterance is correctly predicted 
by the preferred center of the immediately preced- 
ing utterance, i.e., C'b(Ui) = C p (Ui-i),i = 2...n. 
Transition pairs are called expensive if the backward- 
looking center of the current utterance is not correctly 
predicted by the preferred center of the immediately 
preceding utterance, i.e., C'b(Ui) ^ C p (Ui-i),i = 
2 . . . n. Table 9 contains a detailed synopsis of cheap 
and expensive transition pairs. In particular, chains 
of the RETAIN transition in passages where the Cb 
does not change (passages with constant theme) show 
that the canonical ordering constraints for the forward- 
looking centers are not appropriate. 

The numbers of centering transition pairs generated 
by the different approaches are shown in Table 10. In 
general, the functional approach shows the best re- 

6 As a consequence of this postulate, we have to rede- 
fine Rule 2 of the Centering Constraints (Grosz et al., 1995, 
p.215) appropriately, which gives an informal characteriza- 
tion of a preference for sequences of CONTINUE over se- 
quences of RETAIN and, similarly, sequences of RETAIN 
over sequences of SHIFT. Our specification for the case of 
text interpretation says that cheap transitions are preferred 
over expensive ones, with cheap and expensive transitions 
as defined in Table 9. 



suits, while the naive and the canonical approaches 
work reasonably well for the literary text, but exhibit 
a poor performance for the texts from the IT domain 
and the news magazine. The results for the latter ap- 
proaches become only slightly more positive with the 
modification of ranking the antecedent of a textual el- 
lipsis above the elliptical expression, but they do not 
compare to the results of the functional approach. 

We were also interested in finding out whether the 
functional ordering we propose possibly "includes" 
the grammatical role based criteria discussed so far. 
We, therefore, re-evaluated the examples already an- 
notated with Cb/Cf data available in the literature 
(for the English language, we considered all exam- 
ples from Grosz et al. (1995) and Brennan et al. 
(1987); for Japanese we took the data from Walker 
et al. (1994)). Surprisingly enough, all examples of 
Grosz et al. (1995) passed the test successfully. Only 
with respect to the troublesome Alfa Romeo driving 
scenario (cf. Brennan et al. (1987, p. 157)) our con- 
straints fail to properly rank the elements of the third 
sentence Cf of that example. 7 Note also that these 
results were achieved without having recourse to ex- 
tra constraints, e.g., the shared property constraint to 
account for anaphora parallelism (Kameyama, 1986). 

We applied our constraints to Japanese examples in 
the same way. Again we abandoned all extra con- 
straints set up in these studies, e.g., the Zero Topic As- 
signment (ZTA) rule and the special role of empathy 

7 In essence, the very specific problem addressed by that 
example seems to be that Friedman has not been previously 
introduced in the local discourse segment and is only acces- 
sible via the global focus. 



constraint that elliptical antecedents are ranked higher 
than elliptical expressions (short: "ante > express"). 

For the evaluation of a centering algorithm on nat- 
urally occurring text it is necessary to specify how to 
deal with complex sentences. In particular, methods 
for the interaction between intra- and intersentential 
anaphora resolution have to be defined, since the cen- 
tering model is concerned only with the latter case (see 
Suri & McCoy (1994)). We use an approach as de- 
scribed by Strube (1996) for the evaluation. 

Since most of the anaphors in these texts are nom- 
inal anaphors, the resolution of which is much more 
restricted than that of pronominal anaphors, the rate of 
success for the whole anaphora resolution process is 
not significant enough for a proper evaluation of the 
functional constraints. The reason for this lies in the 
fact that nominal anaphors are far more constrained by 
conceptual criteria than pronominal anaphors. So the 
chance to properly resolve a nominal anaphor, even 
at lower ranked positions in the center lists, is greater 
than for pronominal anaphors. While we shift our 
evaluation criteria away from simple anaphora resolu- 
tion success data to structural conditions based on the 
proper ordering of center lists (in particular, we focus 
on the most highly ranked item of the forward-looking 
centers) these criteria compensate for the high propor- 
tion of nominal anaphora that occur in our test sets. 
The types of centering transitions we make use of (cf. 
Table 7) are taken from Walker et al. (1994). 





C b (U„) = C b (U„-!) 
ORC 6 ((/„_i) undef. 


C b (U n ) / 
Cb(t/ n -i) 


C b (U n ) = 
C p (U n ) 


CONTINUE 


SMOOTH-SHIFT 


C b (U n ) / 
C p (U n ) 


RETAIN 


ROUGH-SHIFT 



Table 7: Transition Types 



4.2 Evaluation Results 

In Table 8 we give the numbers of centering transi- 
tions between the utterances in the three test sets. The 
first column contains those which are generated by the 
naive approach (such a proposal was made by Gordon 
et al. (1993) as well as by Rambow (1993) who, nev- 
ertheless, restricts it to the German middlefield only). 
We simply ranked the elements of Cf according to 
their text position. While it is usually assumed that the 
elliptical expression ranks above its antecedent (Grosz 
et al., 1995, p.217), we assume the contrary. The sec- 
ond column contains the results of this modification 
with respect to the naive approach. In the third column 
of Table 8 we give the numbers of transitions which 
are generated by the canonical constraints as stated by 
Grosz et al. (1995, p.214, 217). The fourth column 
supplies the results of the same modification as was 



used for the naive approach, viz. elliptical antecedents 
are ranked higher than elliptical expressions. The fifth 
column shows the results which are generated by the 
functional constraints from Table 2. 

First, we examine the error data for anaphora res- 
olution for the five cases. All approaches have 99 
errors in common. These are due to underspecifica- 
tions at different levels, e.g., the failure to account 
for prepositional anaphors (16), plural anaphors (8), 
anaphors which refer to a member of a set (14), sen- 
tence anaphors (21), and anaphors which refer to the 
global focus (12). Only 6 errors of the functional ap- 
proach are directly caused by an inappropriate order- 
ing of the Cf, while the naive approach leads to 10 
errors and the canonical to 7. When the antecedent of 
an elliptical expression is ranked above the elliptical 
expression itself the error rate of these two augmented 
approaches increases to 12 and 9, respectively. 

We now turn to the distribution of transition types 
for the different approaches. The centering model as- 
sumes a preference order among these transitions, e.g., 
CONTINUE ranks above RETAIN and RETAIN ranks 
above SHIFT. This preference order reflects the pre- 
sumed inference load put on the hearer or speaker 
to coherently decode or encode a discourse. Since 
the functional approach generates a larger amount 
of CONTINUE transitions, we interpret this as a first 
rough indication that this approach provides for more 
efficient processing than its competitors. 

But this reasoning is not entirely conclusive. Count- 
ing single occurrences of transition types, in general, 
does not reveal the entire validity of the center lists. 
Instead, considering adjacent transition pairs gives a 
more reliable picture, since depending on the text sort 
considered (e.g., technical vs. news magazine vs. lit- 
erary texts) certain sequences of transition types may 
be entirely plausible, though they include transitions 
which, when viewed in isolation, seem to imply con- 
siderable inferencing load (cf. Table 8). For instance, 
a CONTINUE transition which follows a CONTINUE 
transition is a sequence which requires the lowest pro- 
cessing costs. But a CONTINUE transition which fol- 
lows a RETAIN transition implies higher processing 
costs than a SMOOTH-SHIFT transition following a 
RETAIN transition. This is due to the fact that a RE- 
TAIN transition ideally predicts a SMOOTH-SHIFT in 
the following utterance. In this case the SMOOTH- 
SHIFT is the "least effort" transition, because only the 
first element of the Cf of the preceding utterance has 
to be checked to perform the SMOOTH-SHIFT transi- 
tion, while in the case of CONTINUE at least one more 
check has to be performed. Hence, we claim that no 
one particular centering transition is preferred over an- 
other. Instead, we postulate that some centering tran- 
sition pairs are preferred over others. Following this 



(la) 


Cb: DELL-316LT: 316LT 

Cf: [Dell-3 16LT: 316LT, Reserve-Battery-Pack: Reserve-Batteriepack, 
Time-Unit-Pair: 2Minuten, Power: Strom] 


CONTINUE 


(lb) 


Cb: Dell-3 16LT: — 

Cf: [DELL-3 16LT: — , Accu: Akku, STATUS: Status, USER: Anwender] 


CONTINTTF 


(lc) 


Cb: DELL-3 1 6LT: Rechner 

Cf: [DELL-3 16LT: Rechner, Accu: — , DISCHARGE: Entleerung, 
Time-Unit-Pair: 30 Minuten, Time-Unit-Pair: 5 Sekunden] 


CONTINUE 


(Id) 


Cb: Dell-3 1 6LT: er 

Cf: [DELL-3 1 6LT: er, Low-BATTERY-LED : Low-Battery-LED 


CONTINUE 



Table 3: Centering Data for Text Fragment (1) 



(2a) 


Cb: Dell-316LT:316LT 

Cf: [DELL-3 16LT: 316LT, NlMH-Accu: NiMH-Akku] 


CONTINUE 


(2b) 


Cb: DELL- 3 16LT: Rechner 

Cf: [NlMH-Accu: Akku, DELL-3 16LT: Rechner, Time-Unit-Pair: 4 Stunden, 
POWER: Strom] 


RETAIN 


(2c) 


Cb: NlMH-Accu: — 

Cf: [NlMH-Accu: — , CHARGE-TIME: Ladezeit, Time-Unit-Pair: 1,5 Stunden] 


SMOOTH-SHIFT 



Table 4: Centering Data for Text Fragment (2) 



(2) a. Der 316LT wird mit einem NiMH-Akku bestiickt. 

(The 316LT is - with a NiMH-accumulator - 
equipped.) 

b. Durch diesen neuartigen Akku wird der Rechner 
fur ca. 4 Stunden mit Strom versorgt. 
(Because of this new type of accumulator - is 
the computer - for approximately 4 hours - with 
power - provided.) 

c. Dariiberhinaus ist die Ladezeit mit 1,5 Stunden 
sehr kurz. 

(Also - is - the charge time of 1.5 hours - quite 
short.) 

Given these basic relations, we may formulate the 
composite relation > IS (Table 5). It states the condi- 
tions for the comprehensive ordering of items on Cf 
(x and y denote lexical heads). 



>is ■■= i ( x > y) I 

ifx and y both represent the same type of IS pattern 
then the relation > prec applies to x and y 
else ifx and y both represent different forms 
of bound elements 
then the relation >is bound applies to x and y 
else the relation > JS applies to x and y } 



Table 5: Information Structure Relation 



4 Evaluation 

In this section, we first describe the empirical and 
methodological framework in which our evaluation 
experiments were embedded, and then turn to a dis- 
cussion of evaluation results and the conclusions we 
draw from the data. 



4.1 Evaluation Framework 

The test set for our evaluation experiment consisted of 
three different text sorts: 15 product reviews from the 
information technology (IT) domain (one of the two 
main corpora at our lab), one article from the German 
news magazine Der Spiegel, and the first two chapters 
of a short story by the German writer Heiner Mttller 4 . 
The evaluation was carried out manually in order to 
circumvent error chaining 5 . Table 6 summarizes the 
total numbers of anaphors, textual ellipses, utterances 
and words in the test set. 





anaphors 


ellipses 


utterances 


words 


IT 


308 


294 


451 


5542 


Spiegel 


102 


25 


82 


1468 


Miiller 


153 


20 


87 


867 


E 


563 


339 


620 


7877 



Table 6: Test Set 



Given this test set, we compared three major ap- 
proaches to centering, viz. the original model whose 
ordering principles are based on grammatical role in- 
dicators only (the so-called canonical model) as char- 
acterized by Table 1, an "intermediate" model which 
can be considered a naive approach to free word order 
languages, and, of course, the functional model based 
on information structure constraints as stated in Table 
2. For reasons discussed below, augmented versions 
of the naive and the canonical approaches will also be 
considered. They are characterized by the additional 

4 Liebesgeschichte. In Heiner Miiller. Geschichten aus 
der Produktion 2. Berlin: Rotbuch Verlag, pp. 57-63. 

5 A performance evaluation of the current anaphora and 
ellipsis resolution capacities of our system is reported in 
Hahnetal. (1996). 



The main difference between Grosz et al.'s work 
and our proposal concerns the criteria for ranking the 
forward-looking centers. While Grosz et al. assume 
that grammatical roles are the major determinant for 
the ranking on the Cj, we claim that for languages 
with relatively free word order (such as German), it 
is the functional information structure (IS) of the ut- 
terance in terms of the context-boundedness or un- 
boundedness of discourse elements. The centering 
data structures and the notion of context-boundedness 
can be used to redefine Danes' (1974a) trichotomy be- 
tween given information, theme and new information 
(rheme). The Cb(U n ), the most highly ranked element 
of Cf(U n -i) realized in U n , corresponds to the el- 
ement which represents the given information. The 
theme of U n is represented by the preferred center 
C p (U n ), the most highly ranked element of Cj(U n ). 
The theme/rheme hierarchy of U n is represented by 
Cf(U n ) which - in our approach - is partly deter- 
mined by the Cj (U n -i): the rhematic elements of U n 
are the ones not contained in Cj (U n -i) (unbound dis- 
course elements); they express the new information in 
U n . The ones contained in Cj(U n -i) and Cj(U n ) 
(bound discourse elements) are thematic, with the 
theme/rheme hierarchy corresponding to the ranking 
in the Cjs. The distinction between context-bound 
and unbound elements is important for the ranking 
on the Cj, since bound elements are generally ranked 
higher than any other non-anaphoric elements (cf. also 
Hajicova et al. (1992)). 

An alternative definition of theme and rheme in the 
context of the centering approach is proposed by Ram- 
bow (1993). In his approach the theme corresponds to 
the Cb and the theme/rheme hierarchy can be derived 
from those elements of Cj (U n -i) that are realized in 
U n . Rambow does not distinguish, however, between 
the information structure and the thematic structure 
of utterances, which leads to problems when a change 
of the criteria for recognizing the thematic structure is 
envisaged. Our approach is flexible enough to acco- 
modate other conceptions of theme/rheme as defined, 
e.g., by Hajicova et al. (1995), since this change af- 
fects only the thematic but not the information struc- 
ture of utterances. 



bound element(s) > ISb unbound element(s) 
anaphora > f Sfc 

(possessive pronoun xor elliptical antecedent) > JS 
(elliptical expression xor head of anaphoric expression) 

nomheadi > prec nomhead 2 > prec ••• > prec nomhead, 



Table 2: Functional Ranking Constraints on the Cj 



The rules holding for the ranking on the Cj , derived 
from a German language corpus, are summarized in 
Table 2. They are organized into three layers 2 . At 
the top level, > IS denotes the basic relation for the 
overall ranking of information structure (IS) patterns. 
Accordingly, any context-bound expression in the ut- 
terance U n -i is given the highest preference as a po- 
tential antecedent of an anaphoric or elliptical expres- 
sion in U n while any unbound expression is ranked 
next to context-bound expressions. 

The second relation depicted in Table 2, 5j s > 
denotes preference relations dealing exclusively with 
multiple occurrences of (resolved) anaphora, i.e., 
bound elements, in the preceding utterance. > IS 
distinguishes among different forms of context-bound 
elements (viz., anaphora, possessive pronouns and tex- 
tual ellipses) and their associated preference order. 
The final element of >^ s is either the elliptical 
expression or the head of an anaphoric expression 
which is used as a possessive determiner, a Saxon gen- 
itive, a prepositional or a genitival attribute (cf. the 
ellipsis in (2c): "die Ladezeit" (the charge time) vs. 
" seine Ladezeit" (its charge time) or " die Ladezeit des 
Akkus" (the accumulator 's charge time)). 

For illustration purposes, consider text fragment (1) 
and the corresponding C'b / Cj data in Table 3 3 : In ( 1 d) 
the pronoun "er" (it) might be resolved to "Akku" 
(accumulator) or "Rechner" (computer), since both 
fulfill the agreement condition for pronoun resolu- 
tion. Now, "der Rechner" (computer) figures as a 
nominal anaphor, already resolved to DELL-316LT, 
while "Akku" (accumulator) is only the antecedent 
of the elliptical expression "der Entleerung" (dis- 
charge). Therefore, the preferred antecedent of "er" 
(it) is determined as Rechner (computer). 

The bottom level of Table 2 specifies > prec which 
covers the preference order for multiple occurrences 
of the same type of any information structure pattern, 
e.g., the occurrence of two anaphora or two unbound 
elements (all heads in an utterance are ordered by 
linear precedence relative to their text position). In 
sentence (2b), two nominal anaphors occur, "Akku" 
(accumulator) and "Rechner" (computer). The tex- 
tual ellipsis "Ladezeit" (charge time) in (2c) has to 
be resolved to the most preferred element of the Cj 
of (2b), viz. the entity denoted by "Akku" (accumula- 
tor) (cf. Table 4). Note that "Rechner" (computer) is 
the subject of the sentence, though it is not the pre- 
ferred antecedent, since "Akku" (accumulator) pre- 
cedes "Rechner" (computer) and is anaphoric as well. 



disregarding coordinations, the ordering we propose in- 
duces a strict ordering on the entities in a center list. 

3 Minuten (minutes) is excluded from the Cf for reasons 
concerning the processing of complex sentences (cf. Strube 
(1996)). 



role patterns to more adequately account for the or- 
dering of discourse entities in center lists. In Section 
3 we elaborate on the particular information structure 
criteria underlying a function-based center ordering. 
We also make a second, even more general method- 
ological claim for which we have gathered some pre- 
liminary, though still not conclusive evidence. Based 
on a re-evaluation of empirical arguments discussed 
in the literature on centering, we stipulate that ex- 
changing grammatical by functional criteria is also a 
reasonable strategy for fixed word order languages. 
Grammatical role constraints can indeed be rephrased 
by functional ones, which is simply due to the fact 
that grammatical roles and the information structure 
patterns, as we define them, coincide in these kinds 
of languages. Hence, the proposal we make seems 
more general than the ones currently under discus- 
sion in that, given a functional framework, fixed and 
free word order languages can be accounted for by the 
same ordering principles. As a consequence, we argue 
against Walker et al.'s (1994, p.227) stipulation, which 
assumes that the Cf ranking is the only parameter of 
the centering theory which is language-dependent. In- 
stead, we claim that functional centering constraints 
for the Cj ranking are possibly universal. 

The second major contribution of this paper is re- 
lated to the unified treatment of specific text phe- 
nomena. It consists of an equally balanced treatment 
of intersentential (pro)nominal anaphora and textual 
ellipsis (also called functional or partial anaphora). 
The latter phenomenon (cf. the examples given in the 
next section), in particular, is usually only sketchily 
dealt with in the centering literature, e.g., by assert- 
ing that the entity in question "is realized but not di- 
rectly realized" (Grosz et al., 1995, p.217). Further- 
more, the distinction between those two kinds of re- 
alization is generally delegated to the underlying se- 
mantic theory. We will develop arguments how to lo- 
cate elliptical discourse entities and resolve textual el- 
lipsis properly at the center level. The ordering con- 
straints we supply account for all of the above men- 
tioned types of anaphora in a precise way, includ- 
ing (pro)nominal anaphora (Strube & Hahn, 1995; 
Hahn & Strube, 1996). This claim will be validated 
by a substantial body of empirical data (cf. Section 4). 

2 Types of Anaphora Considered 

Text phenomena, e.g., textual forms of ellipsis and 
anaphora, are a challenging issue for the design of 
parsers for text understanding systems, since imper- 
fect recognition facilities either result in referentially 
incoherent or invalid text knowledge representations. 
At the conceptual level, textual ellipsis relates a quasi- 
anaphoric expression to its extrasentential antecedent 
by conceptual attributes (or roles) associated with that 



antecedent (see, e.g., the relation between "Akkus" 
(accumulator) and "316LT", a particular notebook, in 
(lb) and (la)). Thus, it complements the phenomenon 
of nominal anaphora, where an anaphoric expression 
is related to its antecedent in terms of conceptual gen- 
eralization (as, e.g., "Rechner" (computer) in (lc) 
refers to "316LT" in (la) mediated by the textual ellip- 
sis in (lb)). The resolution of text-level nominal (and 
pronominal) anaphora contributes to the construction 
of referentially valid text knowledge bases, while the 
resolution of textual ellipsis yields referentially coher- 
ent text knowledge bases. 

(1) a. Ein Reserve-Batteriepack versorgt den 316LT ca. 
2 Minuten mit Strom. 

(A reserve battery pack - supplies - the 316LT - 
for approximately 2 minutes - with power.) 

b. Der Status des Akkus wird dem Anwender ange- 
zeigt. 

(The status of the accumulator - is - to the user - 
indicated.) 

c. Ca. 30 Minuten vor der Entleerung beginnt der 
Rechner 5 Sekunden zu beepen. 
(Approximately 30 minutes - before the discharge 

- starts - the computer - for 5 seconds - to beep.) 

d. 5 Minuten bevor er sich ausschaltet, fangt die 
Low-Battery-LED an zu blinken. 

(5 minutes - before - it - itself - turns off - begins 

- the low-battery-LED - to flash.) 

In the case of textual ellipsis, the missing concep- 
tual link between two discourse elements occurring in 
adjacent utterances must be inferred in order to estab- 
lish the local coherence of the discourse (for an early 
statement of that idea, cf. Clark (1975)). In the sur- 
face form of utterance (lb) the information is missing 
that "Akkus" (accumulator) links up with "316LT". 
This relation can only be made explicit if conceptual 
knowledge about the domain, viz. the relation part-of 
between the concepts ACCUMULATOR and 3 16LT, is 
available (see Hahn et al. (1996) for a more detailed 
treatment of text ellipsis resolution). 

3 Principles of Functional Centering 

Within the framework of the centering model 
(Grosz et al., 1995), we distinguish each utterance's 
backward-looking center (C'b(U n )) and its forward- 
looking centers (Cf(U n )). The ranking imposed on 
the elements of the Cf reflects the assumption that the 
most highly ranked element of Cf ( U n ) - the preferred 
center C p (U n ) - is the most preferred antecedent of 
an anaphoric or elliptical expression in U n +i, while 
the remaining elements are partially ordered accord- 
ing to decreasing preference for establishing referen- 
tial links. Hence, the most important single construct 
of the centering model is the ordering of the list of 
forward-looking centers (Walker et al., 1994). 
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Abstract 

Based on empirical evidence from a free 
word order language (German) we propose a 
fundamental revision of the principles guid- 
ing the ordering of discourse entities in the 
forward-looking centers within the center- 
ing model. We claim that grammatical role 
criteria should be replaced by indicators 
of the functional information structure of 
the utterances, i.e., the distinction between 
context-bound and unbound discourse ele- 
ments. This claim is backed up by an empir- 
ical evaluation of functional centering. 

1 Introduction 

The centering model has evolved as a methodology for 
the description and explanation of the local coherence 
of discourse (Grosz et al., 1983; 1995), with focus on 
pronominal and nominal anaphora. Though several 
cross-linguistic studies have been carried out (cf. the 
enumeration in Grosz et al. (1995)), an almost canon- 
ical scheme for the ordering on the forward-looking 
centers has emerged, one that reflects well-known reg- 
ularities of fixed word order languages such as En- 
glish. With the exception of Walker et al. (1990; 
1994) for Japanese, Turan (1995) for Turkish, Ram- 
bow (1993) for German and Cote (1996) for English, 
only grammatical roles are considered and the (par- 
tial) ordering in Table 1 1 is taken for granted. 



subject > dir-object > indir-object 
> complement(s) > adjunct(s) 



Table 1: Grammatical Role Based Ranking on the Cf 

'Table 1 contains the most explicit ordering of grammat- 
ical roles we are aware of and has been taken from Bren- 
nan et al. (1987). Often, the distinction between comple- 
ments and adjuncts is collapsed into the category "others" 
(cf., e.g., Grosz et al. (1995)). 



Our work on the resolution of anaphora (Strube & 
Hahn, 1995; Hahn & Strube, 1996) and textual el- 
lipsis (Hahn et al., 1996), however, is based on Ger- 
man, a free word order language, in which grammat- 
ical role information is far less predictive for the or- 
ganization of centers. Rather, for establishing proper 
referential relations, the functional information struc- 
ture of the utterances becomes crucial (different per- 
spectives on functional analysis are brought forward 
in Danes (1974b) and Dahl (1974)). We share the no- 
tion of functional information structure as developed 
by Danes (1974a). He distinguishes between two cru- 
cial dichotomies, viz. given informationvs. new infor- 
mation (constituting the information structure of ut- 
terances) on the one hand, and theme vs. rheme on 
the other (constituting the thematic structure of utter- 
ances; cf. Halliday & Hasan (1976, pp.325-6)). Danes 
refers to a definition given by Halliday (1967) to avoid 
the confusion likely to arise in the use of these terms: 
"[...] while given means what you were talking about 
(or what I was talking about before), theme means 
what I am talking about (now) [...]" Halliday (1967, 
p.212). Danes concludes that the distinction between 
given information and theme is justified, while the dis- 
tinction between new information and rheme is not. 
Thus, we arrive at a trichotomy between given infor- 
mation, theme and rheme (the latter being equivalent 
to new information). We here subscribe to these con- 
siderations, too, and will return in Section 3 to these 
notions in order to rephrase them more explicitly by 
using the terminology of the centering model. 

In this paper, we intend to make two contributions 
to the centering approach. The first one, the intro- 
duction of functional notions of information structure 
in the centering model, is methodological in nature. 
The second one concerns an empirical issue in that we 
demonstrate how a functional model of centering can 
successfully be applied to the analysis of several forms 
of anaphoric text phenomena. 

At the methodological level, we develop arguments 
that (at least for free word order languages) grammat- 
ical role indicators should be replaced by functional 
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